AITopics | target function

We study feature learning in two-layer neural networks within the linear-width regime, where the number of hidden neurons, sample size, and input dimension scale proportionally. While recent work has analyzed feature learning via a single step of gradient descent on the first layer weights in this regime, such one-step update schemes are fundamentally limited: the update to the weights is approximately rank-one, captures only a single direction, and requires the target function to have an information exponent of one. In this paper, we go beyond one-step updates to provide a full characterization of the features learned during the \textit{second step} of gradient descent with step-sizes $η_1\asymp N^{α_1}$ and $η_2 \asymp N^{α_2}$ for $α_1, α_2 \in [0,0.5)$, where $N$ is the number of hidden neurons. We derive a spectral characterization of the updated weights, demonstrating they behave as a spiked random matrix with multiple outliers, each corresponding to a learned direction. We show that the number of the outliers is determined by the parameters $α_1, α_2$ through $\lfloor \frac{α_2}{1/2 - α_1} \rfloor$. Furthermore, by analyzing the alignment between the learned directions and the target function, we identify a gap between training with independent versus reused batches. While independent batches restrict learning to directions with an information exponent of one, batch reuse enables the second update to capture directions even when the information exponent exceeds one, provided that $α_1, α_2$ are chosen properly. This shows that the benefits of batch reuse, previously observed in narrow-width regimes, persist in the linear-width limit as well. By characterizing these early-phase evolutions, our work proposes a tractable framework for studying optimization and feature learning phenomenology in modern overparameterized networks.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

2605.17767

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback

0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-1-2026, 01:32:36 GMT

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Power and limitations of single-qubit native quantum neural networks

Neural Information Processing SystemsApr-27-2026, 05:55:21 GMT

Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. We in particular establish the exact correlations between the parameters of the trainable gates and the Fourier coefficients, resolving an open problem on the universal approximation property of QNN. Second, we discuss the limitations of single-qubit native QNNs on approximating multivariate functions by analyzing the frequency spectrum and the flexibility of Fourier coefficients. We further demonstrate the expressivity and limitations of single-qubit native QNNs via numerical experiments. We believe these results would improve our understanding of QNNs and provide a helpful guideline for designing powerful QNNs for machine learning tasks.

artificial intelligence, machine learning, qnn, (17 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.46)

Industry: Education > Curriculum > Subject-Specific Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

51311013e51adebc3c34d2cc591fefee-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 21:40:47 GMT

artificial intelligence, machine learning, representation, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

51311013e51adebc3c34d2cc591fefee-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 21:40:43 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.28)
North America > United States > Maryland > Prince George's County > College Park (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Neural Information Processing SystemsApr-25-2026, 20:19:31 GMT

Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using'convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error t P β to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher tis smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

4b5deb9a14d66ab0acc3b8a2360cde7c-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 18:53:27 GMT

For certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization, but for the networks used in practice, the empirical NTK only provides a rough first-order approximation. Still, a growing body of work keeps leveraging this approximation to successfully analyze important deep learning phenomena and design algorithms for new applications. In our work, we provide strong empirical evidence to determine the practical validity of such approximation by conducting a systematic comparison of the behavior of different neural networks and their linear approximations on different tasks. We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks, even when they achieve very different performances. However, in contrast to what was previously reported, we discover that neural networks do not always perform better than their kernel approximations, and reveal that the performance gap heavily depends on architecture, dataset size and training task. We discover that networks overfit to these tasks mostly due to the evolution of their kernel during training, thus, revealing a new type of implicit bias.

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Country: